The purpose of this document is to show my approach to understanding precipitation in Santa Barbara County by accessing and analyzing precipitation data sets and then applying that analysis to the question: how does precipitation in Santa Barbara County affect the water level of the Cachuma Reservoir.
Rainfall data was gathered from the Santa Barbara County Website.
This data consisted of 81 separate .xls files, each from a different rainfall gauge in Santa Barbara County, which I put into a single folder. The time period for this data is from the year 1899 to the year 2021 with daily rainfall totals measured in inches. Latitude and Longitude coordinates and elevation in feet were also provided with the data per station id.
I then created a function to read and clean these files. A for loop was used to apply this function to all 81 files and put all of the data into a single data frame.
A separate function and for loop was used to extract the location data for each rainfall gauge.
After this process I had two data frames, one with daily rainfall totals for each date based on station id. and another with the latitude and longitude coordinates of each station (rainfall gauge).
The next step was to wrangle this data to produce data frames which could be used for analysis.
I first used ggmap::get_stamenmap() to get a map of Santa Barbara County.
I then created another data frame with yearly rainfall totals averaged over all the years in the data set based on station id.
This allowed me to plot these points onto the map.
Average yearly precipitation in Santa Barbara County by location from 1899-2021
The next step was to spatially interpolate the data to get a better idea of the precipitation in the entire county.
I decided to use the form of spatial interpolation known as Ordinary Kriging.
The first step in this process was to create a variogram, which describes the spatial dependence. This was done using the gstat::variogram(). The function automap::autofitVariogram() was used to choose the model that best fits the data. Next, after defining a target grid, the gstat::krige() function was used to generate the set of predictions.
Spatial interpolation of average yearly precipitation in Santa Barbara County from 1899-2021
To get a better understanding of how precipitation has been changing over the years, I looped this process over a subset of years.
Spatially interpolated yearly precipitation in Santa Barbara County
With the precipitation data sorted out, I could now apply it to understanding the effect of precipitation on water level change in the Cachuma Reservoir. The Cachuma Reservoir is heavily relied upon by the city of Santa Barbara, which is entitled to 32.19% of its available water. Understanding how rainfall affects this reservoir is vitally important.
I found reservoir level data for the Cachuma Reservoir on the County of Santa Barbara Public Works website.
Fairly consistent data was provided going back to the year 2015. Reservoir level were measured at 15 minute intervals in the units of feet.
With this data, I was able to make some quick plots.
Cachuma Reservoir levels and level changes over time
After matching up the reservoir data with the precipitation data, I wanted to see which stations recorded rainfall levels that correlated most with changes in reservoir water level. To do this, I created a function that assigns an r squared value to a station based on the results of a regression analysis of the effect of monthly total rainfall at that station and the monthly change in reservoir level.
Initially I performed a simple linear regression using the lm() function. Later on in the analysis process, I found that a polynomial regression, done using lm(x ~ poly(y, 2)), was able to fit the data much better. This is why I used a polynomial regression model for this function. The equation for this model is \[\operatorname{monthly\_level\_change} = \alpha + \beta_{1}(\operatorname{month\_precip}) + \beta_{2}(\operatorname{month\_precip^2}) + \epsilon\]
I then looped the function over all of the 81 stations and put the results into a new data frame.
With these r squared values, I thought a good way to visualize this would be with another spatial interpolation. This interpolation would show differences in correlation between rainfall and reservoir water level change across the county.
Correlation between precipitation and reservoir level change based on r squared value attained through linear regression analysis
The station with the highest correlation was 238 with an r squared value of 0.844
I chose to use station 238 for the remainder of my analysis.
Here is the summary of the linear regression model for station 238.
##
## Call:
## lm(formula = monthly_level_change ~ poly(month_precip, 2), data = station_238_monthly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5733 -0.5385 0.1320 1.1200 9.2592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5817 0.5054 3.130 0.00307 **
## poly(month_precip, 2)1 47.5492 3.5622 13.348 < 2e-16 ***
## poly(month_precip, 2)2 27.2486 3.5544 7.666 1.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.495 on 45 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.8438, Adjusted R-squared: 0.8368
## F-statistic: 121.5 on 2 and 45 DF, p-value: < 2.2e-16
I then put together a graph of the data with the linear model, as well as graphs of the residuals.
Relationship between precipitation at station 238 and reservoir water level change
Revisiting the plots I made of the Cachuma Reservoir water levels earlier, I could now confidently add the precipitation data from station 238.
Daily and monthly interaction between precipitation at station 238 and reservoir water level
While it first seemed reasonable that station 238 would have the highest correlation with change in reservoir level, upon closer examination of the terrain, station 238 seems to actually be located outside of the watershed feeding the reservoir. I believe this makes its correlation even more interesting. My assumption is that precipitation at station 238 mimics the precipitation nearby on the other side of the ridge just to the south, an area that does drain into Cachuma Reservoir. Due to the relative lack of rain gauge stations in that area that are at at a similar elevation (there is one nearby that is at a higher elevation), it seems possible that station 238 could be the best representation of precipitation just on the other side of the ridge. It would be interesting to investigate this further in the future with the help of more data and the installation of more rain gauge stations.
“SB County Public Works Water Resources Hydrology - Daily Rainfall XLS.” n.d. Accessed November 30, 2021. https://www.countyofsb.org/pwd/dailyrainfall.sbc.
“Sensor.” n.d. Accessed November 30, 2021. https://rain.cosbpw.net/sensor/?time_zone=US%2FPacific&site_id=105&site=70729dd9-97d4-430a-9271-7b6c195b49be&device_id=1&device=5d7a3129-708d-4881-9886-f84c6686ab41&data_start=2012-10-29%2000%3A00%3A00&data_end=2012-11-28%2023%3A59%3A59&bin=3600&range=Custom%20Range&markers=false&legend=true&thresholds=true&refresh=off&show_raw=true&show_quality=true.